Evolved DNA Duplex Readers for Strand-Asymmetrically Modified 5-Hydroxymethylcytosine/5-Methylcytosine CpG Dyads

5-Methylcytosine (mC) and 5-hydroxymethylcytosine (hmC), the two main epigenetic modifications of mammalian DNA, exist in symmetric and asymmetric combinations in the two strands of CpG dyads. However, revealing such combinations in single DNA duplexes is a significant challenge. Here, we evolve methyl-CpG-binding domains (MBDs) derived from MeCP2 by bacterial cell surface display, resulting in the first affinity probes for hmC/mC CpGs. One mutant has low nanomolar affinity for a single hmC/mC CpG, discriminates against all 14 other modified CpG dyads, and rivals the selectivity of wild-type MeCP2. Structural studies indicate that this protein has a conserved scaffold and recognizes hmC and mC with two dedicated sets of residues. The mutant allows us to selectively address and enrich hmC/mC-containing DNA fragments from genomic DNA backgrounds. We anticipate that this novel probe will be a versatile tool to unravel the function of hmC/mC marks in diverse aspects of chromatin biology.


Table of Contents
duplex with similar affinity as the duplex in question, but labeled with a different fluorophore, i.e., "red dots in green gate". The false positive rate is a worst-case lower bound since in reality, the gating can be constraint in the direction of the second fluorophore as well.   genotypes, 6,000 were observed in both sequencing replicates (data points shown for these only), indicating >87% unique genotypes; Pearson's correlation coefficient given. The sampled genotypes coded for 33,900 distinct phenotypes with 7.8% amber (TAG) nonsense mutants (expected: 11.9%), 88.5% missense mutants (expected: 88.1%) and 3.7% wildtype MeCP2 (expected: <0.01%). The number of stop codons disallowed by the used NNK mutagenesis (TAA, TGA) was <0.09%.    Table S8 and Figure   4d) and third (dashed) order polynomial fits, macroscopic binding constants, observed data (points).   interacting with mC/hmC-dT* DNA duplex (o3115/o4328). Distances between 5 and 6 nm found in both measurements were reviewed in a control experiment using wt-MeCP2 bound to singly labeled DNA (mC/mC-dT*), see Figure S10.      Tables   Table S1. Plasmids.   pBeB1383 pET    Table S3. ODNs for FACS.
An asterisk in the oligonucleotide sequence indicates a 3'-5' phosphorothioate linkeage. Table S5. ODNs used in the qPCR assay.
An asterisk in the oligonucleotide sequence indicates a 3'-5' phosphorothioate linkeage.      To calculate the abundance of a genotype or the respective phenotype, the fraction of UMI counts within the number of distinct sequences was used. For genotypes or phenotypes not present in the sequencing of the initial library (due to under sampling of more than 1 million clones with 2 x 50,000 reads), the original abundance was assumed to be 0.9 / (# distinct seq.), which is likely an overestimate in most cases ( Figure S4b). The amino acid enrichment per position was determined from the total of distinct codon-UMI combinations. Electrophoretic-mobility shift assay (EMSA). EMSAs were carried out as described previously using the ODN pairs given in Table S6. 2 The MBP-tag was removed prior to the assay by treating a 15 µM MBD sample with 0.25 µM TEV protease (about 1 µg per mL aliquot) at 4 °C overnight.

Recombinant
Determination of binding affinity with single band shift. The gel shift data was quantified with ImageQuant TL v8.1 1D Gel Analysis (GE Healthcare) using rubber band background subtraction and manual peak detection with approximately equal peak areas for each band across all lanes. The exported data was curated and analyzed with R v4.0.1 using the Levenberg-Marquardt nonlinear least-squares algorithm 10 to fit the following model: 11 [RL] /  If the microscopic binding constant k a for occupying the first (most affine) site a is sufficiently different from binding the second, third or further sites b, c, etc., then K 1 is a good estimator for k a given there is no or negligible little cooperation between the binding sites (which is reasonable for MBD binding in this case). The fitting was implemented such that all K n > 1 were shared for DNA probes with the same sequence (representing the same additional binding sites b, c etc.) and only the K 1 were specific for the differentially modified CpGs. Model estimates in Table S8. This (and some additional microscopic model fitting) were implemented in summerrband (DOI: 10.5281/zendo.5501758).
Site-directed spin labeling of proteins. The protein constructs containing engineered cysteines were thawed and incubated with a 3-fold molar excess of Tris-(2-carboxyethyl)-phosphin (TCEP, Sigma Aldrich) at room temperature for 30 min. Afterwards, TCEP was removed using Zeba™ Spin Desalting Resin (7 MWKO, Thermo Fisher). A 6-fold molar excess of (1-Oxyl-2,2,5,5-tetramethylpyrroline-3methyl) methanethiosulfonate (MTSL, Enzo Life Sciences) in DMSO was added to the protein samples and incubation was performed over night at 4°C and 300 rpm (ThermoMixer C, Eppendorf). chemical shifts were referenced to the external standard 2, 2-dimethyl-2-silapentene-5-sulfonates (DSS), while 15 N and 13 C chemical shifts were calibrated indirectly. The near-complete 1 H, 13  shifts, respectively. The factor 10 for 15 N chemical shift was taken as the normalization factor since the broad range of nitrogen chemical shifts is approximately ten times that of proton chemical shifts for the backbone amides in folded proteins.
Neighbor-corrected structural propensity and chemical-shift perturbation calculation. Secondary structural elements of both proteins were assessed via NMR chemical shifts of 15 N, 13 C  , 13 C  , 1 H H , and 13 CO in the framework of neighbor-corrected structural-propensity prediction. 20 Preparation of spike-in probes. Spike-in probes are ligation products of a DNA duplex carrying the modified CpGs ("carrier") and a DNA duplex with a general and a unique primer binding site for quantitation by qPCR ("adapter"). Adapters were created by primer extension of the primer pairs o4371/o4372, o4373/o4374, and o4392/o4393 (Table S5)  room temperature, then placed on ice; the MethylCap kit protein was treated likewise). The capture reaction was allowed to take place over 2 hours (can be performed also overnight) in 200 µL tubes in a rotating wheel at 4 °C before 10 µL of the supplied GSH-coated magnetic beads were added. The MBD-DNA complexes were immobilized for 1 h. Then, each reaction was washed with 1 x 40 µL "Wash Buffer 1" and 2 x 40 µL "Wash Buffer 2" for 5 min shaking at 950 rpm at 16 °C before the DNA was recovered at once in 30 µL "High Elution Buffer" for 10 min. The supernatant (or 10% input diluted in elution buffer) was column-purified using a commercial kit (Macherey-Nagel, Düren, Germany) and recovered in 2 x 20 µL (50,000 copies samples) or 2 x 10 µL (5,000 copies samples) 5 mM Tris-HCl pH = 8.5. qPCR measurements were made in duplicates for each sample and each target (optimized conditions see "Preparation of spike-in probes"). Individual spike-in concentrations were determined relative to a dilution series of pure spike-in probes and the recovery determined relative to the input.